AITopics | text box

Collaborating Authors

text box

Information about AI from the News, Publications, and Conferences

Automatic Classification – Tagging and Summarization – Customizable Filtering and Analysis

If you are looking for an answer to the question What is Artificial Intelligence? and you only have a minute, then here's the definition the Association for the Advancement of Artificial Intelligence offers on its home page: "the scientific understanding of the mechanisms underlying thought and intelligent behavior and their embodiment in machines."

However, if you are fortunate enough to have more than a minute, then please get ready to embark upon an exciting journey exploring AI (but beware, it could last a lifetime) …

MAQuA: Adaptive Question-Asking for Multidimensional Mental Health Screening using Item Response Theory

Varadarajan, Vasudha, Xu, Hui, Boehme, Rebecca Astrid, Mirstrom, Mariam Marlan, Sikstrom, Sverker, Schwartz, H. Andrew

arXiv.org Artificial IntelligenceNov-21-2025

Recent advances in large language models (LLMs) offer new opportunities for scalable, interactive mental health assessment, but excessive querying by LLMs burdens users and is inefficient for real-world screening across transdiagnostic symptom profiles. We introduce MAQuA, an adaptive question-asking framework for simultaneous, multidimensional mental health screening. Combining multi-outcome modeling on language responses with item response theory (IRT) and factor analysis, MAQuA selects the questions with most informative responses across multiple dimensions at each turn to optimize diagnostic information, improving accuracy and potentially reducing response burden. Empirical results on a novel dataset reveal that MAQuA reduces the number of assessment questions required for score stabilization by 50-87% compared to random ordering (e.g., achieving stable depression scores with 71% fewer questions and eating disorder scores with 85% fewer questions). MAQuA demonstrates robust performance across both internalizing (depression, anxiety) and externalizing (substance use, eating disorder) domains, with early stopping strategies further reducing patient time and burden. These findings position MAQuA as a powerful and efficient tool for scalable, nuanced, and interactive mental health screening, advancing the integration of LLM-based agents into real-world clinical workflows.

descriptive word, large language model, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2508.07279

Country:

Europe (0.93)
North America > United States > New Mexico (0.14)

Genre: Research Report > New Finding (1.00)

Industry:

Health & Medicine > Therapeutic Area > Psychiatry/Psychology > Addiction Disorder (1.00)
Health & Medicine > Therapeutic Area > Neurology > Attention Deficit/Hyperactivity Disorder (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

Overcoming Vision Language Model Challenges in Diagram Understanding: A Proof-of-Concept with XML-Driven Large Language Models Solutions

Shiinoki, Shue, Koshihara, Ryo, Motegi, Hayato, Morishige, Masumi

arXiv.org Artificial IntelligenceFeb-5-2025

Diagrams play a crucial role in visually conveying complex relationships and processes within business documentation. Despite recent advances in Vision-Language Models (VLMs) for various image understanding tasks, accurately identifying and extracting the structures and relationships depicted in diagrams continues to pose significant challenges. This study addresses these challenges by proposing a text-driven approach that bypasses reliance on VLMs' visual recognition capabilities. Instead, it utilizes the editable source files--such as xlsx, pptx or docx--where diagram elements (e.g., shapes, lines, annotations) are preserved as textual metadata. In our proof-of-concept, we extracted diagram information from xlsx-based system design documents and transformed the extracted shape data into textual input for Large Language Models (LLMs). This approach allowed the LLM to analyze relationships and generate responses to business-oriented questions without the bottleneck of image-based processing. Experimental comparisons with a VLM-based method demonstrated that the proposed text-driven framework yielded more accurate answers for questions requiring detailed comprehension of diagram structures.The results obtained in this study are not limited to the tested .xlsx files but can also be extended to diagrams in other documents with source files, such as Office pptx and docx formats. These findings highlight the feasibility of circumventing VLM constraints through direct textual extraction from original source files. By enabling robust diagram understanding through LLMs, our method offers a promising path toward enhanced workflow efficiency and information analysis in real-world business scenarios.

information, large language model, machine learning, (15 more...)

arXiv.org Artificial Intelligence

2502.04389

Country:

Asia > Japan (0.04)
North America > United States > Florida > Miami-Dade County > Miami (0.04)
Europe > Switzerland (0.04)
(3 more...)

Genre: Research Report > New Finding (0.67)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Add feedback

PC Agent: While You Sleep, AI Works -- A Cognitive Journey into Digital World

He, Yanheng, Jin, Jiahe, Xia, Shijie, Su, Jiadi, Fan, Runze, Zou, Haoyang, Hu, Xiangkun, Liu, Pengfei

arXiv.org Artificial IntelligenceDec-23-2024

Imagine a world where AI can handle your work while you sleep - organizing your research materials, drafting a report, or creating a presentation you need for tomorrow. However, while current digital agents can perform simple tasks, they are far from capable of handling the complex real-world work that humans routinely perform. We present PC Agent, an AI system that demonstrates a crucial step toward this vision through human cognition transfer. Our key insight is that the path from executing simple "tasks" to handling complex "work" lies in efficiently capturing and learning from human cognitive processes during computer use. To validate this hypothesis, we introduce three key innovations: (1) PC Tracker, a lightweight infrastructure that efficiently collects high-quality human-computer interaction trajectories with complete cognitive context; (2) a two-stage cognition completion pipeline that transforms raw interaction data into rich cognitive trajectories by completing action semantics and thought processes; and (3) a multi-agent system combining a planning agent for decision-making with a grounding agent for robust visual grounding. Our preliminary experiments in PowerPoint presentation creation reveal that complex digital work capabilities can be achieved with a small amount of high-quality cognitive data - PC Agent, trained on just 133 cognitive trajectories, can handle sophisticated work scenarios involving up to 50 steps across multiple applications. This demonstrates the data efficiency of our approach, highlighting that the key to training capable digital agents lies in collecting human cognitive data. By open-sourcing our complete framework, including the data collection infrastructure and cognition completion methods, we aim to lower the barriers for the research community to develop truly capable digital agents.

artificial intelligence, deep learning, machine learning, (17 more...)

arXiv.org Artificial Intelligence

2412.17589

Country:

Asia > Japan > Honshū > Chūbu > Toyama Prefecture > Toyama (0.04)
Asia > China > Shanghai > Shanghai (0.04)

Genre:

Workflow (1.00)
Research Report (1.00)

Industry: Information Technology (0.46)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning > Agents (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

HEDS 3.0: The Human Evaluation Data Sheet Version 3.0

Belz, Anya, Thomson, Craig

arXiv.org Artificial IntelligenceDec-10-2024

This paper presents version 3.0 of the Human Evaluation Datasheet (HEDS). This update is the result of our experience using HEDS in the context of numerous recent human evaluation experiments, including reproduction studies, and of feedback received. Our main overall goal was to improve clarity, and to enable users to complete the datasheet more consistently and comparably. The HEDS 3.0 package consists of the digital data sheet, documentation, and code for exporting completed data sheets as latex files, all available from the HEDS GitHub.

artificial intelligence, experiment, natural language, (17 more...)

arXiv.org Artificial Intelligence

2412.0794

Country:

Asia > Japan > Honshū > Kantō > Tokyo Metropolis Prefecture > Tokyo (0.14)
Europe > Ireland > Leinster > County Dublin > Dublin (0.04)
North America > United States > New York > New York County > New York City (0.04)
(4 more...)

Genre:

Research Report (1.00)
Questionnaire & Opinion Survey (0.67)

Industry: Information Technology > Security & Privacy (0.47)

Technology: Information Technology > Artificial Intelligence > Natural Language (1.00)

Add feedback

What Kind of Writer Is ChatGPT?

The New YorkerOct-3-2024, 10:00:00 GMT

Last spring, a graduate student in social anthropology--let's call him Chris--sat down at his laptop and asked ChatGPT for help with a writing assignment. He pasted a few thousand words, a mix of rough summaries and jotted-down bullet points, into the text box that serves as ChatGPT's interface. "Here's my entire exam," he wrote. "Don't edit it, I will tell you what to do after you've read it." Chris was tackling a difficult paper about perspectivism, which is the anthropological principle that one's perspective inevitably shapes the observations one makes and the knowledge one acquires.

large language model, machine learning, natural language, (21 more...)

The New Yorker

Country:

North America > United States > Texas (0.05)
North America > United States > North Carolina (0.05)

Industry:

Education (0.55)
Health & Medicine (0.49)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Chatbot (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (0.92)

Add feedback

PostDoc: Generating Poster from a Long Multimodal Document Using Deep Submodular Optimization

Jaisankar, Vijay, Bandyopadhyay, Sambaran, Vyas, Kalp, Chaitanya, Varre, Somasundaram, Shwetha

arXiv.org Artificial IntelligenceMay-30-2024

A poster from a long input document can be considered as a one-page easy-to-read multimodal (text and images) summary presented on a nice template with good design elements. Automatic transformation of a long document into a poster is a very less studied but challenging task. It involves content summarization of the input document followed by template generation and harmonization. In this work, we propose a novel deep submodular function which can be trained on ground truth summaries to extract multimodal content from the document and explicitly ensures good coverage, diversity and alignment of text and images. Then, we use an LLM based paraphraser and propose to generate a template with various design aspects conditioned on the input content. We show the merits of our approach through extensive automated and human evaluations.

postdoc, submodular function, summarization, (17 more...)

arXiv.org Artificial Intelligence

2405.20213

Country:

North America > United States > Oregon > Multnomah County > Portland (0.04)
Europe > Germany > Berlin (0.04)
Europe > Germany > Bavaria > Upper Bavaria > Munich (0.04)
(3 more...)

Genre: Research Report (0.40)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.94)

Add feedback

Ensemble Learning for Vietnamese Scene Text Spotting in Urban Environments

Nguyen, Hieu, Ta, Cong-Hoang, Le-Nguyen, Phuong-Thuy, Tran, Minh-Triet, Le, Trung-Nghia

arXiv.org Artificial IntelligenceMar-31-2024

This paper presents a simple yet efficient ensemble learning framework for Vietnamese scene text spotting. Leveraging the power of ensemble learning, which combines multiple models to yield more accurate predictions, our approach aims to significantly enhance the performance of scene text spotting in challenging urban settings. Through experimental evaluations on the VinText dataset, our proposed method achieves a significant improvement in accuracy compared to existing methods with an impressive accuracy of 5%. These results unequivocally demonstrate the efficacy of ensemble learning in the context of Vietnamese scene text spotting in urban environments, highlighting its potential for real world applications, such as text detection and recognition in urban signage, advertisements, and various text-rich urban scenes.

recognition, scene text, text recognition, (14 more...)

arXiv.org Artificial Intelligence

doi: 10.1109/rivf60135.2023.10471878

2404.00852

Country: Asia > Vietnam > Hồ Chí Minh City > Hồ Chí Minh City (0.04)

Genre: Research Report (1.00)

Technology:

Information Technology > Data Science > Data Mining (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (0.70)

Add feedback

Evaluation Metrics for Automated Typographic Poster Generation

Rebelo, Sérgio M., Merelo, J. J., Bicker, João, Machado, Penousal

arXiv.org Artificial IntelligenceFeb-10-2024

Computational Design approaches facilitate the generation of typographic design, but evaluating these designs remains a challenging task. In this paper, we propose a set of heuristic metrics for typographic design evaluation, focusing on their legibility, which assesses the text visibility, aesthetics, which evaluates the visual quality of the design, and semantic features, which estimate how effectively the design conveys the content semantics. We experiment with a constrained evolutionary approach for generating typographic posters, incorporating the proposed evaluation metrics with varied setups, and treating the legibility metrics as constraints. We also integrate emotion recognition to identify text semantics automatically and analyse the performance of the approach and the visual characteristics outputs.

alignment, metric, text box, (14 more...)

arXiv.org Artificial Intelligence

2402.06945

Country:

Europe > Portugal > Coimbra > Coimbra (0.04)
Europe > Spain > Andalusia > Granada Province > Granada (0.04)
Europe > Italy (0.04)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Natural Language (0.93)
Information Technology > Artificial Intelligence > Machine Learning > Evolutionary Systems (0.49)

Add feedback

How to create stickers on the iPhone using your photos in iOS 17

EngadgetDec-20-2023, 13:30:39 GMT

Creating stickers from photos is an easily overlooked iPhone feature tucked into iOS 17. Using Apple's machine learning algorithms that quickly separate a subject from its background, it extracts pictures of you, your friends or pets (or anything else it detects as the picture's subject), transforming them into digital decals. It even makes animated stickers from Live Photos to slap onto iMessage chats or Markup tools. Here's how to create your own. In Apple's ecosystem, stickers are digital versions of their real-world counterparts. They debuted in iOS 10, Apple's 2016 iPhone operating system, allowing users to place cut-outs of fun images onto iMessage bubbles for more personalized reactions.

iphone, menu, sticker, (15 more...)

Engadget

Technology:

Information Technology > Artificial Intelligence > Machine Learning (0.91)
Information Technology > Communications > Mobile (0.89)

Add feedback

A Graphical Approach to Document Layout Analysis

Wang, Jilin, Krumdick, Michael, Tong, Baojia, Halim, Hamima, Sokolov, Maxim, Barda, Vadym, Vendryes, Delphine, Tanner, Chris

arXiv.org Artificial IntelligenceAug-3-2023

Document layout analysis (DLA) is the task of detecting the distinct, semantic content within a document and correctly classifying these items into an appropriate category (e.g., text, title, figure). DLA pipelines enable users to convert documents into structured machine-readable formats that can then be used for many useful downstream tasks. Most existing state-of-the-art (SOTA) DLA models represent documents as images, discarding the rich metadata available in electronically generated PDFs. Directly leveraging this metadata, we represent each PDF page as a structured graph and frame the DLA problem as a graph segmentation and classification problem. We introduce the Graph-based Layout Analysis Model (GLAM), a lightweight graph neural network competitive with SOTA models on two challenging DLA datasets - while being an order of magnitude smaller than existing models. In particular, the 4-million parameter GLAM model outperforms the leading 140M+ parameter computer vision-based model on 5 of the 11 classes on the DocLayNet dataset. A simple ensemble of these two models achieves a new state-of-the-art on DocLayNet, increasing mAP from 76.8 to 80.8. Overall, GLAM is over 5 times more efficient than SOTA models, making GLAM a favorable engineering choice for DLA tasks.

data mining, machine learning, natural language, (20 more...)

arXiv.org Artificial Intelligence

2308.02051

Country:

North America > United States > California > Los Angeles County > Los Angeles (0.14)
North America > United States > Massachusetts > Middlesex County > Cambridge (0.04)
North America > United States > New York > New York County > New York City (0.04)
(2 more...)

Genre: Research Report (0.50)

Technology:

Information Technology > Artificial Intelligence > Vision (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Data Science > Data Mining (0.94)
(2 more...)

Add feedback